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The density of states contains all informations on energetic quantities 
of a statistical system, such as the mean energy, free energy, entropy, and 
specific heat. As a specific application, we consider in this work a simple 
lattice model for heteropolymers that is widely used for studying statisti- 
cal properties of proteins. For short chains, we have derived exact results 
from conformational enumeration, while for longer ones we developed a 
multicanonical Monte Carlo variant of the nPERM-based chain growth 
method in order to directly simulate the density of states. For simplifi- 
cation, only two types of monomers with respective hydrophobic (H) and 
polar (P) residues are regarded and only the next-neighbour interaction be- 
tween hydrophobic monomers, being nonadjacent along the chain, is taken 
into account. This is known as the HP model for the folding of lattice 
proteins. 

PACS numbers: 05.10.-a, 87.15.Aa, 87.15. Cc 

1. Introduction 

Proteins perform numerous functions in a biological cell system, e.g. con- 
trolling of transport processes of organelles, stabilisation of the cell struc- 
ture, enzymatic catalysis of chemical reactions, etc. It is well established 
that the three-dimensional conformation of a protein within an aqueous en- 
vironment determines its biological function. Due to the enormous number 
of tasks to be necessarily fulfilled to ensure the stability of a biological sys- 
tem, a large number of various proteins exists. All of them are built up 
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of chains of amino acid residues, linked by peptide bonds. Since 20 differ- 
ent amino acids are known from nature, a protein with N monomers is, 
in principle, formed from 20^ possible sequences. Only a small number of 
so-called designing sequences, however, is actually realised in equilibrium. 
The reason is that the protein must be stable against thermal fluctuations 
and may not fold into a different shape leading to a loss of its associated 
function. Therefore, real proteins are supposed to possess a funnel-like deep 
global minimum in a rough free energy landscape [lj. It is one of the es- 
sential goals of computational protein research to identify the native state 
associated with the global free energy minimum of a protein with a given 
sequence of amino acid residues. Since the sequence of amino acids is known 
to be responsible for the resulting fold, it is also interesting to analyse what 
properties sequences of such favoured proteins have. 

Unfortunately, computer simulations of real proteins are extremely dif- 
ficult due to the relatively big number of degrees of freedom influenced by 
electrostatic, Lennard-Jones, hydrogen bond, torsional, and environmental 
interactions (for a review see, e.g., Ref. [2]). In order to qualitatively study 
the folding behaviour of proteins and also for sequence analysis, simple lat- 
tice models seem to be very practical. Nevertheless, the determination of 
the lowest-energy states and their degeneracies remains challenging. In fact, 
it was shown [3 J that folding proteins within the HP model [3], the most 
simple lattice model for proteins, is an NP-complete problem. On the nu- 
merical side, one technical problem is that the polymers are required to be 
self-avoiding. Thus, updating the conformation in a Monte Carlo simulation 
is quite involved. Two completely different methods are widely used, first 
the application of a move set consisting of transformations that allow the 
change of a conformation of total length N, while in the second method, 
chain growth, a new monomer is attached to the end of a partial chain 
of length n < N until the total chain length is reached. Both techniques 
work well in computer simulations of polymers at comparatively high tem- 
peratures, for example the investigation of the ©-point transition between 
compact globule polymer states and random coils [5] . For studying the low- 
temperature behaviour of heteropolymers, however, the application of move 
sets is not very suitable, since transformations that usually belong to a move 
set, e.g. end and corner flips, crankshafts, and pivot rotations are inefficient 
for the creation of very dense conformations. The transition between lowest- 
energy states and compact globules represents a "conformational barrier" 
at low temperatures that is much better circumvented with chain-growth 
based algorithms such as PERM [6 J and its new variants nPERMg| [7]. 

We are interested in the energetic thermodynamic properties of het- 
eropolymers for all temperatures and therefore we proposed a multicanonical 
chain growth algorithm [8] which allows an explicit sampling of the density 
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of states. The density of states is identical with the canonical distribution 
at infinite temperature. Nevertheless, we also obtain very accurate results 
in the low-temperature region which in effect is due to the capacity of the 
multicanonical sampling [9] which spreads the canonical distribution to a 
flat histogram, such that all energetic states are, in principle, equally prob- 
able within the simulation. At the end, the canonical distribution at any 
temperature and thus all thermodynamic functions can be obtained by a 
simple reweighting procedure. This is only possible, since the multicanonical 
method allows a sampling of the entire space of states, including such events 
that are canonically suppressed by many orders of magnitude. In our sim- 
ulations of the HP model for lattice proteins with more than 40 monomers, 
we were also required to sample the lowest-energy states having a probabil- 
ity of realization in the density of states of the order of 10~ 25 , since these 
states dominate the low-temperature behaviour of the protein. Another 
problem is that the conformational transition between ground states and 
globules just appears in this temperature region, causing a conformational 
barrier that is avoided best, as described above, by using an adequate chain 
growth algorithm. Therefore we combined the multicanonical method with 
the new PERM variants for simple and importance sampling, nPERMss 
and nPERMis [7], respectively, to obtain densities of states with high and 
uniform accuracies for all energies. 



2. Density of States of HP Lattice Proteins 

For simplicity, we investigate lattice proteins that consist of only two 
types of monomers: hydrophobic (H) and polar (P). This choice is made 
since most of the amino acids occurring in nature can be grouped into these 
two classes. Moreover it is assumed that the protein mainly folds due to an 
effective hydrophobic interaction. This means that a core of hydrophobic 
monomers is formed which is screened from the aqueous solvent by a shell 
of polar (or hydrophilic) residues. The simplest form of the HP model takes 
into account only the attractive interaction between next-neighbouring H 
monomers being nonadjacent along the chain [3]: 

E = - °i a h W 

(i,j<i-l) 

where Oi = (1) if the ith. monomer is polar (hydrophobic). The partition 
sum of a HP lattice protein with fixed sequence at temperature T is then 
given by Z = J2{*} ex P{—E({x})/kBT}, where the sum is taken over all 
admissible conformations of the polymer. Sorting all conformational states 
with respect to their energies, the partition sum can also be expressed in 
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Fig. 1. Unique ground state of the 14mcr with designing sequence 14.1 (dark 
spheres: hydrophobic residues, light spheres: polar monomers). 

terms of the density (or degeneracy) g(E) of states with energy E: 

Z = J2g(Ei)exp{-E t /k B T}. (2) 

i 

Knowing g(E), the mean energy (E) of the system can be calculated by 

(F \( T) = E» Ej g(Ej) exp{-Ej/k B T} 

[ >{ >- Ei9(Ei)eM-Ei/k B T} { > 

and the specific heat is given by the fluctuation formula 

C V (T) = ^ ((E 2 ) - (Ef) . (4) 

Other energetic quantities being related to the density of states are the 
Gibbs free energy 

F(T) = -k B T In ]T g(Ei) ex V {-Ei/k B T} (5) 

i 

and the entropy 

S(T) = ^[(E)(T)-F(T)}. (6) 

3. Exact Enumeration of 14mers 

As a first example we have investigated HP proteins with 14 monomers 
by enumerating all possible conformations. This study is quite interesting, 
because there is only one sequence (HPHPH2PHPH2P2H, in the following 
denoted as 14.1) that is designing, i.e. the ground state of the associated 
lattice protein is unique (up to translational, rotational, and one reflection 
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Fig. 2. Exact densities of states of exemplified 14mers with similar properties (nu — 
8, -E m in = — 8) but different sequence. 

symmetry). It possesses nu = 8 hydrophobic monomers and the ground- 
state energy is E m i n = —8, since there are 8 hydrophobic contacts (see 
Fig- Hi- I n order to understand the particular properties of such a protein 
with the lowest ground-state degeneracy among all the 2 14 different 14mers, 

Table 1. Exact total densities of the states with energy E for the 14mers. The 
entries of the table include all states that contribute to the partition function Z x 
for 14mers at infinite temperature (except translations) which counts the number 
of self-avoiding random walks with (14 — 1) = 13 steps. 
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Fig. 3. Specific heat of the 14mers. 



we compare it with three other ones having similar properties (n# = 8, 
Emm = —8), but different sequences and therefore different ground-state 
degeneracies. The degeneracy of the lowest-energy state of the sequences 
14.2 (H 2 P 2 HPHPH 2 PHPH) and 14.3 (H 2 PHPHP 2 HPHPH 2 ) is twice that of 
the designing sequence 14.1, while sequence 14.4 (H 2 PHP 2 HPHPH 2 PH) is 
even four times higher degenerated. Figure [2] shows the densities of states for 
the four sequences. Since the densities of the excited states do not consid- 
erably differ (see Table [TJ, the low-temperature behaviour of these proteins 
can only vary due to the different ground-state degeneracies. Indeed, the 
specific heat shown in Fig. exhibits a pronounced low-temperature peak 
only for the designing sequence 14.1, while it is largely suppressed for the 
other proteins. This peak indicates the transition from the ground states 
to compact globule states. At higher temperatures, the globules unfold and 
form random coil conformations. 



4. Simulation of a 42mer: Lattice Model of Pectate Lyase C 

For lattice proteins with more than 20 monomers, enumeration be- 
comes exhausting, since the number of conformations grows exponentially 
with the number of monomers [10]. More sophisticated search algorithms 
are required to sample the phase space. For this reason, we developed a 
multicanonical chain growth algorithm [8] that combines the advantages 
of avoiding conformational barriers by using a PERM-based chain growth 
method [61 [7] and the capacity of a flat histogram technique allowing the 
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sampling of the entire energy space [9]. In order to achieve this, the canon- 
ical distributions provided by PERM at each intermediate length of the 
growing chain must be flattened. As usual, the multicanonical weights are 
determined by an iterative procedure [9j. We applied this method to calcu- 
late the density of states of a lattice 42mer with sequence PH2PHPH2PHP- 
HP2H3PHPH2PHPH3P2HPHPH2PHPH2P which was designed to simulate 
the ground-state properties of the parallel (5 helix of the protein pectate 
lyase C [TTJ [121 H3J- The ground state is known to be low-degenerated. 
Up to translations, rotations, and reflections there are only 4 ground-state 
conformations with energy -Emm = —34. The density of states ranges over 
25 orders of magnitude, and the ground states were hit frequently with our 
simulation method such that the low-temperature properties of this protein 
could be investigated with good accuracy. In Fig. 01 we show the specific 
heat and the mean energy of the 42mer. The specific heat has two peaks, 
the low-temperature ground-state-globule transition occurs near To « 0.27 
and the transition between globules and random coils at T\ « 0.53. 



In Ref . [8] , we have also compared two 48mers with different ground-state 
degeneracies and found also there that a pronounced low-temperature peak 
in the specific heat only appears for the example with the lower degeneracy 
of the ground state (which was about 5000 in that case). 
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5. Summary 

We have discussed the relation between the low degeneracy of the low- 
lying energy states and the appearance of a low-temperature transition be- 
tween compact globules and ground states of HP lattice proteins with 14 
and 42 monomers, respectively. For this purpose, we calculated the density 
of states of the 14mers by exact enumeration of all possible conformations. 
In order to simulate the density of states of the 42mer with necessarily high 
accuracy, we developed a multicanonical chain growth algorithm that en- 
abled us to sample the density of states over the entire energy space. As the 
main qualitative conclusion we find a correlation between the degeneracy 
of low-lying states and the sharpness of the transition to compact globule 
states. 
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